129 research outputs found

    Galois Connections between Semimodules and Applications in Data Mining

    Get PDF
    In [1] a generalisation of Formal Concept Analysis was introduced with data mining applications in mind, K-Formal Concept Analysis, where incidences take values in certain kinds of semirings, instead of the standard Boolean carrier set. A fundamental result was missing there, namely the second half of the equivalent of the main theorem of Formal Concept Analysis. In this continuation we introduce the structural lattice of such generalised contexts, providing a limited equivalent to the main theorem of K-Formal Concept Analysis which allows to interpret the standard version as a privileged case in yet another direction. We motivate our results by providing instances of their use to analyse the confusion matrices of multiple-input multiple-output classifiers

    Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web

    Get PDF
    The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due to the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.Publicad

    End-to-end Recurrent Denoising Autoencoder Embeddings for Speaker Identification

    Get PDF
    Speech 'in-the-wild' is a handicap for speaker recognition systems due to the variability induced by real-life conditions, such as environmental noise and emotions in the speaker. Taking advantage of representation learning, on this paper we aim to design a recurrent denoising autoencoder that extracts robust speaker embeddings from noisy spectrograms to perform speaker identification. The end-to-end proposed architecture uses a feedback loop to encode information regarding the speaker into low-dimensional representations extracted by a spectrogram denoising autoencoder. We employ data augmentation techniques by additively corrupting clean speech with real life environmental noise and make use of a database with real stressed speech. We prove that the joint optimization of both the denoiser and the speaker identification module outperforms independent optimization of both modules under stress and noise distortions as well as hand-crafted features.Comment: 8 pages + 2 of references + 5 of images. Submitted on Monday 20th of July to Elsevier Signal Processing Short Communication

    Reconocimiento de habla mediante transparametrización : una alternativa robusta para entornos móviles e IP

    Get PDF
    En el panorama actual de las telecomunicaciones, dos son los tipos de redes con mayor éxito en la actualidad: las redes de móviles y las redes de paquetes basadas en el protocolo TCP/IP (–Transport Control Protocol / Internet Protocol-). Entre los factores que han llevado al éxito de las primeras en su segunda generación (2G) está su ubicuidad, es decir, gracias al enorme despliegue geográfico de estas redes es posible realizar una llamada telefónica desde casi cualquier localización (en el mundo desarrollado). Por su parte, las redes IP (originalmente diseñadas para el transporte de datos) también están logrando imponer su presencia en detrimento de cualquier otro tipo de red fija y uno de sus puntos fuertes es, sin duda, su capacidad –todavía bastante limitada– para transmitir cualquier tipo de información multimedia. Uno de los puntos de convergencia entre las dos redes es su objetivo de permitir que todo tipo de información transite por ellas con ciertas garantías de calidad de servicio (QoS -Quality of Service-). Esto está motivado por la cantidad de nuevas aplicaciones que pueden crearse a partir de la posibilidad de combinar informaciones de distinto tipo (texto, video, voz, imágenes, música, etc.) y las tecnologías del habla están llamadas a jugar un papel fundamental a través del desarrollo de interfaces más naturales para estas aplicaciones. Entre estas tecnologías, el reconocimiento de habla está llegando a una fase de madurez que hace cada vez más viables estos desarrollos. De hecho, desde hace algún tiempo se viene prestando mucha atención a la robustez de estos sistemas cuando se trasladan al mundo real, habiéndose desarrollado numerosas técnicas para enfrentarse a problemas tales como: variaciones en el entorno acústico, influencia de los transductores y el canal de transmisión y variaciones en el hablante y la tarea que se aborda. En esta tesis estudiamos la influencia de dos tipos de canales de transmisión concretos, representantes de los dos tipos de redes que hemos venido introduciendo: el estándar europeo para comunicaciones móviles GSM (-Global System for Mobile-, anteriormente –Group Speciale Mobile) y el de las actuales redes basadas en los protocolos TCP/IP. Además, proponemos una solución, que hemos denominado reconocimiento mediante transparametrización, con la que mejoramos la tasas de reconocimiento en ambos entornos y que, aunque en un principio, hemos particularizado para dichos entornos, puede ser aplicada en otros. La característica común de la transmisión de voz a través de estas dos redes es el proceso de codificación que tiene lugar para adecuar su régimen binario reducir. Esta compresión con pérdidas de la señal de voz produce un deterioro de su calidad, que si bien es aceptable en el caso de reconocedores humanos –los codificadores están diseñados para minimizar la distorsión perceptible–, se traduce en una disminución apreciable de las prestaciones de los reconocedores automáticos. Por otra parte, los errores de transmisión que se producen en ambos entornos, contribuyen también a la degradación de las prestaciones de los reconocedores. En GSM, estos errores aparecen en forma de ráfagas de bits erróneos producidas por desvanecimientos de la señal de radiofrecuencia, que pueden afectar a una o varias tramas consecutivas, completamente o sólo en parte. El caso de IP es algo distinto ya que, en general, no se suelen producir errores de bit a ráfagas y muy raramente errores aislados, debido a la alta fiabilidad del canal, sino que lo más común es que se produzcan pérdidas de paquetes (a ráfagas) en los nodos de enrutamiento. En cualquier caso, lo que se pone de manifiesto en esta tesis es que el hecho de que este tipo de errores se produzcan sobre la voz codificada tiene consecuencias que no se pueden tratar de la misma manera que si se produjeran sobre la señal de voz original (por ejemplo, modelando los errores haciendo las hipótesis habituales de ruido convolutivo o aditivo). Es decir, si tenemos en cuenta que el proceso de codificación de la voz consiste, a grandes rasgos, en la extracción de una serie de parámetros que representan distintos aspectos específicos de este tipo de señal (su periodo fundamental, la posición de sus formantes, su característica sonora o sorda, su energía, etc), nos percataremos de que la modificación de cada uno de ellos tiene consecuencias muy distintas sobre la señal vocal reconstruida. Un reconocedor convencional que recibe una señal de voz codificada, la primera acción que realiza sobre ella es su decodificación y de esa forma, ya puede proceder a realizar la extracción de características o parametrización para reconocimiento. En este proceso, las distorsiones de codificación y de los errores se trasladan a los parámetros a partir de los que se realizará el reconocimiento, produciendo el deterioro de las prestaciones del reconocedor. Para mejorar esta situación, en esta tesis proponemos el análisis de la parametrización de la señal de voz que lleva a cabo el codificador antes de su decodificación y la transformación de ésta en otra adecuada para el reconocimiento. Esto además, nos permite utilizar métodos de recuperación frente a errores y de transformación de parametrizaciones orientados directamente al reconocimiento, sin limitarnos a los ya previstos en los estándares de codificación, cuyo propósito es recuperar una señal de voz perceptualmente aceptable sujetos a una fuerte restricción de tiempo real. De esta forma, obtenemos una solución aplicable a ambos entornos (GSM e IP) que reduce la influencia sobre los reconocedores, por una parte, la distorsión de codificación haciendo una selección de la información relevante para reconocimiento, y por otra, el efecto de los errores de transmisión, actuando directamente sobre los parámetros afectados. Resulta notable el hecho de que esta solución sea aplicable tanto a entornos móviles como a redes de tipo IP, ya que puede emplearse cuando existe una combinación de ambas como parece ser la tendencia. _________________________________________________Nowadays, two are the most important types of telecommunication networks worldwide: mobile networks and those based on the TCP/IP (–Transport Control Protocol / Internet Protocol-) protocol. Among the factors which have led the first ones to success in its second generation (2G) is its ubiquity, i.e., the enormous geographic deployment of these networks makes it possible to place a phone call virtually from any location (in the developed world). By its side, IP networks (originally designed for data transport) are succeeding in progressively substituting other kinds of fixed networks and one of its most remarkable advantages is, no doubt, its ability –still quite limited– for transmitting any kind of multimedia information. One of the points of convergence between both networks is thus, its aim to provide the means for any kind of information to be sent over it with a certain quality of service (QoS). This is motivated by the huge range of new applications that can be created due to the possibility of combining multiple kind of information types (text, video, voice, images, music, etc.), and here is where speech technologies are called to play a fundamental role by providing more natural interfaces. Among these technologies, speech recognition is attaining enough maturity to allow for these developments. In fact, during the last years, much attention has been payed to the robustness of these systems when they are transferred to the real world, having many techniques been developed to cope with problems such as: acoustic environment variations, transductors and transmission channels influences and variations in the speaker and tasks. In this thesis we will analyse the influence of two specific transmission channel types, which represent an example of both types of network introduced above: European mobile communications standard, GSM (Global System for Mobile, which previously stood for –Group Speciale Mobile-) and the present-day TCP/IP-based networks. Besides, we propose an alternative solution which we have named by which we attain improvements in the performances of recognizers under both environments conditions and, though tested specifically in those particular networks, also potencially applicable in some other. A common characteristic of speech transmission over these networks is the previous speech coding process that takes place to adequate the bit rate. This lossy process produces a quality drop which, though not very disappointing for human recognizers –as codecs are designed to minimize perceptual distortion–, sometimes harmful for automatic speech recognizers. Besides, transmission errors caused by both networks, contribute to the degradation of the recognizers accuracy. Under the GSM environment, these errors appear in the form of bit bursts produced by signal fadings in radio-frequency channels which can affect one or more consecutive frames, just in part or completely. IP-based networks is somehow different because, generally speaking, isolated or burst bit errors are not usually encountered due to the high channel reliability, but on the contrary, it is common for the router nodes to drop packets during congestion situations. In any case, what is here emphasized is that the fact that when these errors are placed on the coded speech signal produce effects that cannot be treated the same way as if they had been produced on the original one (for example, modelling error sources as convolutive or additive). In other words, if we take into account that the speech coding process consists, roughly speaking, on the extraction of a series of parameters representing the various aspects of this particular kind of signal (its fundamental frequency, formant positions, voicing mode, energy, etc.), it will become apparent that the modification of each of them affects in a different manner the reconstructed speech. A conventional speech recognizer receiving a coded signal, first proceeds to its decoding which enables the usual feature extraction procedure that recognizers habitually perform. By this process, coding and errors distortions are transferred to the recognition parameterization, finally producing a disminishing of the recognition rates. To improve this situation, we propose the analysis of the speech codec parameterization before its decoding and its transformation into an adequate one for recognition. This, besides, allows us to use recognition-oriented error recovery and parameterization transformation methods not restricted to the ones provided by the coding standards, whose aim is to recover perceptually acceptable signals subject to very strict real-time restrictions. Therefore, we have obtained a method applicable to both environments, by which we are capable of reducing the influence on the recognizer by minimizing, on the one hand, the coding distortion by means of making a selection of the relevant recognition information embedded into the speech coding parameterization and, on the other hand, the effect of the transmission errors, by acting straight on the affected parameters. It is worth mention that the fact that this method is effective both under mobile and IP networks environments can represent an advantage for its application in combined situations, which seems to be the trend

    Towards the algebraization of Formal Concept Analysis over complete dioids

    Get PDF
    Actas de: XVII Congreso Español sobre Tecnologías y Lógica Fuzzy (ESTYLF 2014). Zaragoza, 5-7 de febrero de 2014.Complete dioids are already complete residuated lattices. Formal contexts with entries in them generate Concept Lattices with the help of the polar maps. Previous work has already established the spectral nature of some formal concepts for contexts over certain kinds of dioids. This paper tries to raise the awareness that linear algebra over exotic semirings should be one place to look to understand the properties of FCA over L-lattices.FJVA was partially supported by EU FP7 project LiMoSINe (contract 288024) for this research. CPM was partially supported by the Spanish Government-Comisión Interministerial de Ciencia y Tecnología project 2011-268007/TEC.Publicad

    Two Information-Theoretic Tools to Assess the Performance of Multi-class Classifiers

    Get PDF
    We develop two tools to analyze the behavior of multiple-class, or multi-class, classifiers by means of entropic measures on their confusion matrix or contingency table. First we obtain a balance equation on the entropies that captures interesting properties of the classifier. Second, by normalizing this balance equation we first obtain a 2-simplex in a three-dimensional entropy space and then the de Finetti entropy diagram or entropy triangle. We also give examples of the assessment of classifiers with these tools.Spanish Government-Comisión Interministerial de Ciencia y Tecnología projects 2008-06382/TEC and 2008-02473/TEC and the regional projects S-505/TIC/0223 (DGUI-CM) and CCG08-UC3M/TIC-4457 (Comunidad Autónoma de Madrid – UC3M)Publicad

    Spectral Lattices of reducible matrices over completed idempotent semifields

    Get PDF
    Proceedings of: 10th International Conference on Concept Lattices and Their Applications. (CLA 2013). La Rochelle, France, October 15-18, 2013.Previous work has shown a relation between L-valued extensions of FCA and the spectra of some matrices related to L-valued contexts. We investigate the spectra of reducible matrices over completed idempotent semifields in the framework of dioids, naturally-ordered semirings, that encompass several of those extensions. Considering special sets of eigenvectors also brings out complete lattices in the picture and we argue that such structure may be more important than standard eigenspace structure for matrices over completed idempotent semifields.FJVA is supported by EU FP7 project LiMoSINe (contract 288024). CPM has been partially supported by the Spanish Government-Comisión Interministerial de Ciencia y Tecnología project TEC2011-26807 for this paper.Publicad

    Towards Galois Connections over Positive Semifields

    Get PDF
    In this paper we try to extend the Galois connection construction of K-Formal Concept Analysis to handle semifields which are not idempotent. Important examples of such algebras are the extended non-negative reals and the extended non-negative rationals, but we provide a construction that suggests that such semifields are much more abundant than suspected. This would broaden enormously the scope and applications of K-Formal Concept Analysis.CPM & FVA have been partially supported by the Spanish Government-MinECo projects TEC2014-53390-P and TEC2014-61729-EX

    100% classification accuracy considered harmful: The normalized information transfer factor explains the accuracy paradox

    Get PDF
    The most widely spread measure of performance, accuracy, suffers from a paradox: predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. Despite optimizing classification error rate, high accuracy models may fail to capture crucial information transfer in the classification task. We present evidence of this behavior by means of a combinatorial analysis where every possible contingency matrix of 2, 3 and 4 classes classifiers are depicted on the entropy triangle, a more reliable information-theoretic tool for classification assessment. Motivated by this, we develop from first principles a measure of classification performance that takes into consideration the information learned by classifiers. We are then able to obtain the entropy-modulated accuracy (EMA), a pessimistic estimate of the expected accuracy with the influence of the input distribution factored out, and the normalized information transfer factor (NIT), a measure of how efficient is the transmission of information from the input to the output set of classes. The EMA is a more natural measure of classification performance than accuracy when the heuristic to maximize is the transfer of information through the classifier instead of classification error count. The NIT factor measures the effectiveness of the learning process in classifiers and also makes it harder for them to "cheat" using techniques like specialization, while also promoting the interpretability of results. Their use is demonstrated in a mind reading task competition that aims at decoding the identity of a video stimulus based on magnetoencephalography recordings. We show how the EMA and the NIT factor reject rankings based in accuracy, choosing more meaningful and interpretable classifiers.Francisco José Valverde-Albacete has been partially supported by EU FP7 project LiMoSINe (contract 288024): www.limosine-project.eu Carmen Peláez Moreno has been partially supported by the Spanish Government-Comisión Interministerial de Ciencia y Tecnología project TEC2011–26807
    corecore